Conversation
…richment + idempotency fix
Adds a public-source research subroutine that runs at firstboot before
the intro DM goes out. Pulls signals from the GitHub public REST API,
the personal site (og: meta tags), and the LinkedIn public profile page
when a URL is supplied. Composes up to three short bullets with source
citations, capped at 280 chars each, and appends them to the intro DM
under a "What I learned about you so far" subhead. The same bullets are
injected into the onboarding system-prompt overlay so the agent can
reference what it learned in the first conversation.
Architectural invariants enforced:
- Public sources only. No authenticated API calls. No LinkedIn auth
scraping that violates ToS. We fetch the LinkedIn public profile
anonymously and read whatever og: tags it serves; HTTP 999, 403, or
any non-200 means we move on.
- Time-bounded to 15 seconds total via AbortSignal.timeout. A slow
source cannot hold the firstboot DM hostage.
- Per-fetch timeout of 4 seconds so a single hang does not eat the
global budget.
- Don't fabricate. If every probe is empty, returns null bullets and
the intro DM renders without the "What I learned" section.
- Plaintext discipline. The owner email never appears in a bullet,
never gets logged, and is not echoed back to the user.
- Public mailbox domains (gmail, outlook, etc.) are skipped for the
personal-site probe; only custom-domain emails get a fetch.
Also closes the LOW idempotency bug ("Onboarding re-fires on restart
when evolution generation is 0") via a new firstboot_state ledger
table. The startOnboarding entrypoint short-circuits with
skipped: true when intro_sent_at is set; the ledger is stamped only
AFTER a successful Slack send so a transient Slack failure leaves the
flag clear and the next process start retries.
Tests: 75 new tests across fetchers, enrich-owner, firstboot state,
the flow integration, and the prompt builder. The full suite is 2382
tests, 2371 pass + 10 skip + 1 todo + 0 fail. bun typecheck clean,
biome lint clean.
Operator TODOs:
- Phase 1 wizard adds an optional PHANTOM_OWNER_LINKEDIN_URL field;
phantomd firstbootStep stamps it into /etc/default/phantom alongside
PHANTOM_OWNER_EMAIL. The field is not required; the research path
works on email + name alone, LinkedIn is a bonus when present.
- PHANTOM_OWNER_RESEARCH_ENABLED=false is the operator escape hatch
if a customer asks the research subroutine to be off entirely.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2a8856f9e3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| body = await res.text(); | ||
| } catch { | ||
| return null; | ||
| } | ||
|
|
||
| // Cap parsing at 256KB; real og: tags live in the first 16KB. This | ||
| // also defeats memory-blowup pages that ship multi-MB index.html. | ||
| const head = body.slice(0, 256 * 1024); |
There was a problem hiding this comment.
Enforce HTML size cap before reading response body
The page-size guard is applied only after await res.text(), which already buffers the full response in memory. A large or malicious index.html can still consume significant memory/CPU before body.slice(0, 256 * 1024) runs, so the intended 256KB protection is ineffective under real network responses. This undermines the firstboot timeout/perf guarantees for owner research and can degrade startup reliability on oversized pages.
Useful? React with 👍 / 👎.
| if (result.skipped) { | ||
| console.log("[onboarding] firstboot ledger says intro already sent; not re-firing"); | ||
| } else { |
There was a problem hiding this comment.
Rebuild onboarding prompt when intro send is skipped
When startOnboarding returns skipped: true, this branch logs and exits without rebuilding the personalized onboarding prompt, even though needsOnboarding can still be true on restart. In that case the runtime keeps the earlier generic prompt and loses Phase 12 profile/research context for the first conversation after a restart, despite onboarding still being active. This is a regression from the new idempotency path because the skip branch drops prompt enrichment entirely.
Useful? React with 👍 / 👎.
Summary
firstboot_stateSQLite ledger. The intro DM is stamped only after a successful Slack send, so a transient send failure leaves the flag clear and the next process start retries cleanly.Architectural invariants enforced
AbortSignal.timeout. A slow source cannot hold the firstboot DM hostage.AbortControllerso a single hang does not eat the global budget.bullets: null-> intro DM renders without the section.Test plan
bun typecheckcleanbun run lint(biome) cleanbun test: 2382 tests, 2371 pass + 10 skip + 1 todo + 0 failbullets: null,outcome: "empty", intro DM still sends@gmail.comemail never triggers a fetch tohttps://gmail.comstartOnboardingcall returnsskipped: true, no second Slack sendfirstboot_statetable appears in the table-list assertionOperator merge gate
ghostwright/phantomPUBLIC PRs.PHANTOM_OWNER_LINKEDIN_URLfield to the wizard and have phantomd firstboot stamp it into/etc/default/phantomalongsidePHANTOM_OWNER_EMAIL.PHANTOM_OWNER_RESEARCH_ENABLED=falsein the per-tenant env.